During the last 2 years, COVID-19 has been a main focus of the news. Though around 3% of the world population had COVID-19, diabetes can be considered as an even bigger health problem. Indeed, according to the International Diabetes Foundations (IDF), in 2019, 463 million adults were living with diabetes (around 6-7% of the world population) and this number is forecasted to rise to 700 million by 2050. Furthermore, 90% of cases of diabetes are of type 2, which means it results mainly from bad habits and not genetics. However both types of diabetes can be treated and/or prevented with a healthier diet and more physical activity. Additionally, according to the WHO, low income countries are more susceptible to having higher diabetes prevalence. Living in Europe, we observed that diabetes rates differ a lot from one country to another, so we wanted to find out if these rates were indeed linked to a country’s income, and if the nutritious composition of richer states’ population’s diet is also affected by this income difference and if yes, how it is affected.
Therefore, we would like to find out answers to the following questions :
Do European countries that have higher GDPs really have lower diabetes prevalence ?
Do European countries that have higher GDPs consume less calories ?
How do the proportions of macronutrients (animal protein/plant protein/fat/carbohydrates) consumed differ between richer and poorer governments ?
And how do these differences relate to the diabetes prevalence in these countries ?
What is the typical diet that can be observed in richer states that relates to lower diabetes prevalence ?
To answer our research questions, we used three different datasets. While searching for datasets, we made sure that the years and countries matched for every one of them.
The first dataset we used, downloaded from the portal https://ourworldindata.org/diet-compositions, contains information related to the supply of macronutrients in calories for different countries. We used data related to food supply rather than food consumption, as the latter is harder to find and generally, supply reflects the population’s demand and therefore its food consumption. The dataset gives us information on the average nutrition of different countries from 1961 to 2013 :
It is composed of 8981 observations of 7 variables:
Entity Name of the countryCode ISO country codeYearYear of the observationCalories from animal protein (FAO (2017)) The average per capita supply of calories derived from animal protein all measured in kilocalories per person per dayCalories from plant protein (FAO (2017)) The average per capita supply of calories derived from plant protein, all measured in kilocalories per person per dayCalories from fat (FAO (2017))The average per capita supply of calories derived from fat, all measured in kilocalories per person per dayCalories from carbohydrates (FAO (2017)) The average per capita supply of calories derived from carbohydrates, all measured in kilocalories per person per dayThe intake of specific macronutrients (carbohydrates, protein and fats) are derived based on average food composition factors – these factors are derived and presented in the Food and Agriculture Organisation’s (FAO) Food Balance Sheet Handbook (https://www.fao.org/faostat/en/#data).
We will only focus on observations of European countries in the 2000s.
We used the ISO code as it is standardized worldwide and does not have the risk of having different names in different tables like the countries’ names.
Then, we proceeded to compute the mean of the consumption for each type of macronutrient in each country between the years 2000 and 2013, and the sum of total calories per person per day for each country in order to answer our second research question.
We then created a new table by adding the sum of total calories per person per day for each country in order to get a broader view with the total consumption of calories. To make sure that the joining of tables go smoothly, we also removed duplicates and the country name column.
Our assumption was that a county’s wealth may fluctuate over the course of 10 years (ex: a dip during the economic crisis of 2008) but an overall mean is sufficient to compare the different countries and their riches.
We now have a dataframe with the following variables :
country_code ISO country codecal_prot_animalThe mean of the calories from animal protein consumed per person in each country in the years 2000-2013cal_prot_plant The mean of the calories from plant protein consumed per person in each country in the years 2000-2013cal_carbsThe mean of the calories from carbohydrates consumed per person in each country in the years 2000-2013cal_fat The mean of the calories from fat consumed per person in each country in the years 2000-2013total_consumption The total calorie consumption per person based on the means of the consumption of each type of macronutrients in each countries in the years 2000-2013| Country Code | Calories from animal protein | Calories from plant protein | Calories from carbohydrates | Calories from fat | Total consumption |
|---|---|---|---|---|---|
| AUT | 245 | 169 | 1833 | 1454 | 3702 |
| BEL | 238 | 158 | 1856 | 1467 | 3719 |
| BGR | 155 | 168 | 1606 | 846 | 2775 |
| HRV | 168 | 148 | 1691 | 940 | 2946 |
| CYP | 197 | 127 | 1291 | 1019 | 2633 |
| CZE | 218 | 153 | 1728 | 1155 | 3254 |
| DNK | 273 | 157 | 1746 | 1190 | 3366 |
| EST | 212 | 167 | 1967 | 842 | 3188 |
| FIN | 269 | 166 | 1623 | 1177 | 3234 |
| FRA | 293 | 161 | 1611 | 1480 | 3545 |
| DEU | 240 | 159 | 1785 | 1276 | 3460 |
| GRC | 250 | 204 | 1744 | 1338 | 3536 |
| HUN | 193 | 153 | 1560 | 1221 | 3126 |
| IRL | 279 | 174 | 1950 | 1187 | 3590 |
| ITA | 241 | 204 | 1776 | 1390 | 3612 |
| LVA | 203 | 154 | 1687 | 1044 | 3087 |
| LTU | 274 | 198 | 2055 | 858 | 3385 |
| LUX | 288 | 148 | 1752 | 1318 | 3507 |
| MLT | 237 | 204 | 1932 | 994 | 3367 |
| NLD | 292 | 136 | 1599 | 1195 | 3222 |
| POL | 204 | 197 | 1969 | 1035 | 3405 |
| PRT | 275 | 177 | 1824 | 1240 | 3516 |
| ROU | 200 | 220 | 2003 | 916 | 3340 |
| SVK | 140 | 150 | 1610 | 952 | 2853 |
| SVN | 230 | 168 | 1664 | 1067 | 3129 |
| ESP | 279 | 159 | 1481 | 1323 | 3243 |
| SWE | 285 | 143 | 1566 | 1137 | 3131 |
| CHE | 237 | 138 | 1660 | 1392 | 3426 |
| GBR | 232 | 178 | 1748 | 1256 | 3414 |
Our second dataset, downloaded from the portal https://data.worldbank.org, gives us information about the GDP of many countries over the course of 60 years (1960-2020).
It is composed of 266 observations of 65 variables :
Country Name Name of the countryCountry Code ISO country codeIndicator Name equal to “GDP in current US$” for every rowIndicator Code equal to “NY.GDP.MKTP.CD” for every rowAs we can see below, RStudio imported the Excel file as is, and so our column names found themselves at the 3rd row and therefore column names of columns 3 to 65 have been attributed numbers.
We decided to fix that and to filter out the years that is in our interest and that we have in common with other tables, so the years 2000-2013. We decided to get rid of the Indicator Name and Indicator Code variables as well since the values are the same for every row and they do not provide additional useful information.
Now, we want to filter out the European countries, just like in the first table :
In order to join tables easily, we transformed the columns corresponding to different years to a single “year” column, in order to have at each row of this dataset the GDP of a certain country at a certain year.
To make it easier to manipulate data, we decided to rename our variables for this table as well. We also made sure that the type of our numeric variable (GDP) was numeric and not character, like it was by default. In order to have graphs that are easy to read in the exploratory data analysis, we also decided to divide the avg_gdp column by a billion.
Lastly, we computed the average GDP for each country over the years 2000-2013 in order to be able to plot different variables together.
| Country Name | Country Code | Average GDP (in billion $) |
|---|---|---|
| Austria | AUT | 3.36e+11 |
| Belgium | BEL | 4.07e+11 |
| Bulgaria | BGR | 3.74e+10 |
| Croatia | HRV | 4.81e+10 |
| Cyprus | CYP | 2.02e+10 |
| Czech Republic | CZE | 1.58e+11 |
| Denmark | DNK | 2.75e+11 |
| Estonia | EST | 1.65e+10 |
| Finland | FIN | 2.17e+11 |
| France | FRA | 2.28e+12 |
| Germany | DEU | 3.00e+12 |
| Greece | GRC | 2.46e+11 |
| Hungary | HUN | 1.11e+11 |
| Ireland | IRL | 2.03e+11 |
| Italy | ITA | 1.87e+12 |
| Latvia | LVA | 2.10e+10 |
| Lithuania | LTU | 3.08e+10 |
| Luxembourg | LUX | 4.28e+10 |
| Malta | MLT | 7.27e+09 |
| Netherlands | NLD | 7.22e+11 |
| Poland | POL | 3.65e+11 |
| Portugal | PRT | 2.00e+11 |
| Romania | ROU | 1.25e+11 |
| Slovak Republic | SVK | 7.08e+10 |
| Slovenia | SVN | 3.95e+10 |
| Spain | ESP | 1.18e+12 |
| Sweden | SWE | 4.26e+11 |
| Switzerland | CHE | 4.90e+11 |
| United Kingdom | GBR | 2.42e+12 |
We now have a dataframe with the following variables :
country_name name of the countrycountry_code ISO code of the countryavg_gdp the average GDP of a country over the course of 2000-2013Since we will be comparing the GDP with the calories consumed per person, it could be useful to have the GDP per person for the analysis. This is why we will be importing this dataset from https://data.worldbank.org/indicator/SP.POP.TOTL which gives us information on the evolution of the population per country over 1960-2020.
As this dataset comes from the same source and is the same file type as GDP, we can do proceed with the same wrangling
#> Warning in summarize(., avg_population = mean(avg_population)): This
#> is a call to papeR::summarize. If you want to use the dplyr function
#> explizitly call dplyr::summarize() on your data.
The third dataset, downloaded from https://www.ncdrisc.org/data-downloads-diabetes.html, gives us information about the age-standardised diabetes prevalence for each country and gender from 1980 to 2014.
It is composed of 14’000 observations for 7 variables :
Country/Region/World Name of the countryISO ISO country codeSex Gender for which the diabetes prevalence is measured in a certain country at a certain yearYear Year of observation (1980-2014)Age-standardised diabetes prevalence Diabetes rate considering all agesLower 95% uncertainty interval Lower confidence interval limit for the diabetes rateUpper 95% uncertainty interval Higher confidence interval limit for the diabetes rateLike with the first 2 datasets, we filtered our data to keep only European countries and observations between the years 2000 and 2013.
We also decided not to use the 95% confidence interval variable.
Then, we separated our dataset into two subsets. One with data about men.
Another one with data about women.
We then changed the variable names of these 2 subsets to facilitate joining tables later on.
Finally we grouped observations by country to get the mean prevalence/rate of diabetes between 2000 and 2013 for each European country :
| Country Code | Diabetes rate |
|---|---|
| AUT | 0.053 |
| BEL | 0.057 |
| BGR | 0.073 |
| CHE | 0.050 |
| CYP | 0.077 |
| CZE | 0.078 |
| DEU | 0.059 |
| DNK | 0.055 |
| ESP | 0.084 |
| EST | 0.071 |
| FIN | 0.066 |
| FRA | 0.071 |
| GBR | 0.063 |
| GRC | 0.069 |
| HRV | 0.071 |
| HUN | 0.080 |
| IRL | 0.069 |
| ITA | 0.065 |
| LTU | 0.078 |
| LUX | 0.068 |
| LVA | 0.071 |
| MLT | 0.088 |
| NLD | 0.052 |
| POL | 0.074 |
| PRT | 0.075 |
| ROU | 0.062 |
| SVK | 0.072 |
| SVN | 0.066 |
| SWE | 0.058 |
| Country Code | Diabetes rate |
|---|---|
| AUT | 0.053 |
| BEL | 0.057 |
| BGR | 0.073 |
| CHE | 0.050 |
| CYP | 0.077 |
| CZE | 0.078 |
| DEU | 0.059 |
| DNK | 0.055 |
| ESP | 0.084 |
| EST | 0.071 |
| FIN | 0.066 |
| FRA | 0.071 |
| GBR | 0.063 |
| GRC | 0.069 |
| HRV | 0.071 |
| HUN | 0.080 |
| IRL | 0.069 |
| ITA | 0.065 |
| LTU | 0.078 |
| LUX | 0.068 |
| LVA | 0.071 |
| MLT | 0.088 |
| NLD | 0.052 |
| POL | 0.074 |
| PRT | 0.075 |
| ROU | 0.062 |
| SVK | 0.072 |
| SVN | 0.066 |
| SWE | 0.058 |
We now have 2 dataframes with the following variables :
country_code ISO code of the countryprop_men_diabetes or prop_women_diabetesthe average diabetes rate in a country in the 2000-2013 timeframeFor the last step of our tidying, we joined all four tables in one dataset with the country_code key :
| Country Name | Country Code | Average GDP (in billion $) | Men Diabetes | Women Diabetes | Calories from animal protein | Calories from plant protein | Calories from carbohydrates | Calories from fat | Total consumption |
|---|---|---|---|---|---|---|---|---|---|
| Austria | AUT | 3.36e+11 | 0.053 | 0.034 | 245 | 169 | 1833 | 1454 | 3702 |
| Belgium | BEL | 4.07e+11 | 0.057 | 0.039 | 238 | 158 | 1856 | 1467 | 3719 |
| Bulgaria | BGR | 3.74e+10 | 0.073 | 0.064 | 155 | 168 | 1606 | 846 | 2775 |
| Croatia | HRV | 4.81e+10 | 0.071 | 0.059 | 168 | 148 | 1691 | 940 | 2946 |
| Cyprus | CYP | 2.02e+10 | 0.077 | 0.056 | 197 | 127 | 1291 | 1019 | 2633 |
| Czech Republic | CZE | 1.58e+11 | 0.078 | 0.065 | 218 | 153 | 1728 | 1155 | 3254 |
| Denmark | DNK | 2.75e+11 | 0.055 | 0.035 | 273 | 157 | 1746 | 1190 | 3366 |
| Estonia | EST | 1.65e+10 | 0.071 | 0.064 | 212 | 167 | 1967 | 842 | 3188 |
| Finland | FIN | 2.17e+11 | 0.066 | 0.044 | 269 | 166 | 1623 | 1177 | 3234 |
| France | FRA | 2.28e+12 | 0.071 | 0.044 | 293 | 161 | 1611 | 1480 | 3545 |
| Germany | DEU | 3.00e+12 | 0.059 | 0.040 | 240 | 159 | 1785 | 1276 | 3460 |
| Greece | GRC | 2.46e+11 | 0.069 | 0.060 | 250 | 204 | 1744 | 1338 | 3536 |
| Hungary | HUN | 1.11e+11 | 0.080 | 0.063 | 193 | 153 | 1560 | 1221 | 3126 |
| Ireland | IRL | 2.03e+11 | 0.069 | 0.049 | 279 | 174 | 1950 | 1187 | 3590 |
| Italy | ITA | 1.87e+12 | 0.065 | 0.047 | 241 | 204 | 1776 | 1390 | 3612 |
| Latvia | LVA | 2.10e+10 | 0.071 | 0.065 | 203 | 154 | 1687 | 1044 | 3087 |
| Lithuania | LTU | 3.08e+10 | 0.078 | 0.069 | 274 | 198 | 2055 | 858 | 3385 |
| Luxembourg | LUX | 4.28e+10 | 0.068 | 0.039 | 288 | 148 | 1752 | 1318 | 3507 |
| Malta | MLT | 7.27e+09 | 0.088 | 0.066 | 237 | 204 | 1932 | 994 | 3367 |
| Netherlands | NLD | 7.22e+11 | 0.052 | 0.037 | 292 | 136 | 1599 | 1195 | 3222 |
| Poland | POL | 3.65e+11 | 0.074 | 0.066 | 204 | 197 | 1969 | 1035 | 3405 |
| Portugal | PRT | 2.00e+11 | 0.075 | 0.052 | 275 | 177 | 1824 | 1240 | 3516 |
| Romania | ROU | 1.25e+11 | 0.062 | 0.059 | 200 | 220 | 2003 | 916 | 3340 |
| Slovak Republic | SVK | 7.08e+10 | 0.072 | 0.059 | 140 | 150 | 1610 | 952 | 2853 |
| Slovenia | SVN | 3.95e+10 | 0.066 | 0.065 | 230 | 168 | 1664 | 1067 | 3129 |
| Spain | ESP | 1.18e+12 | 0.084 | 0.059 | 279 | 159 | 1481 | 1323 | 3243 |
| Sweden | SWE | 4.26e+11 | 0.058 | 0.040 | 285 | 143 | 1566 | 1137 | 3131 |
| Switzerland | CHE | 4.90e+11 | 0.050 | 0.030 | 237 | 138 | 1660 | 1392 | 3426 |
| United Kingdom | GBR | 2.42e+12 | 0.063 | 0.049 | 232 | 178 | 1748 | 1256 | 3414 |
We did not have any NA values in our tables, we think this is due to the fact that we really spent time on gathering quality data that matched in terms of dates and countries.
First, even though we will be taking the means of the variables with which we are trying to answer our questions, it is interesting to observe their evolution in each country over time. We started with the GDP.
We can see that the GDP of France, Germany, Italy, Spain and the United Kingdom had a significant increase between 2000 and 2008.
Now let’s see if there is a relation between the GDP of a country and its diabetes prevalence. (men = blue, women = red)
We observe that apart of 5 outliers, our observations are mostly bunched up at the left of the graph. We decided to exclude these 5 observations, to see if we can observe a trend with the other countries. These outliers, as we can see on the graph before, are the countries that had a big increase of GDP in the time period of 2000-2013.
Without the outliers, we can see a bit more clearly. Indeed, it seems that the richer a country is, the lesser it has a high diabetes rate among its population.
For the second table, we tried to see again if there was a trend in the consumption of different macro-nutrients in the 2000s for each country in our sample.
In the different countries, there is one difference that stands out and that seems to be related to wealth. Indeed, countries with a higher GDP like Austria consume on average more fat as can be seen on this graph:
Whereas, countries with a lower GDP like Bulgaria have a lower fat consumption, as seen below:
There do not seem to be any trends in the graphs above and diets seem rather stable in each country, which is why we will take the average consumption for each macro-nutrient for our analysis. We can however note that the 5 outliers mentioned before tend to have a higher fat consumption than the countries with a smaller GDP.
We then wanted to analyse the relation between a country’s GDP and its individual consumption of each macronutrient as well as its total calorie consumption to see if there’s a trend.(total calories = orange, fat = blue, carbohydrates = purple, animal protein = red, plant protein = green)
We see that the calorie consumption does not really change. We wanted a close up on the relation between the total calorie consumption with the GDP for each country to see if we can spot outliers again, so we created other plots.
We end up again with these 5 outliers that have a higher than average GDP so if we remove them, we obtain the following plots :
Now we can more easily state that there’s a trend. It appears that the higher a country’s GDP, the higher the total calories consumed, contrary to our hypothesis.
Once again, we tried to see if the diabetes prevalence in each country changed over the years 2000-2013.
We saw right away that the prevalence of diabetes is higher for man than women across all countries (there are however two exceptions : in Romania between 2000 and 2003 and Slovenia between 2000 and 2006).
We observed three different scenarios for countries that we selected: A decrease of diabetes over time for:
We take Belgium as an example :
A decrease over time for women but not for men for :
We take Austria as an example :
In other European countries, the prevalence of diabetes is increasing (at different paces) over time.
We take Croatia as an example :
Finally, we want to plot the relation between the diabetes prevalence against the total calorie consumption as well as each type of macronutrient consumed.
We can see a negative trend for the total consumption, the calories from animal protein and the calories from fat. We can observe a positive trend against calories from plant protein. For protein from carbohydrates, we can see a slighty positive trend for women.
Now, since they affected our plots that included the GDP variable so much, we want to see if we have different trends when we remove our 5 outliers.
Without our 5 outliers, we observe not much change in the trend of each type of calories consumed apart for carbohydrates where the trend changes for men and become slightly positive.
This first question serves more as a control, since we learned during our research prior to our project that countries with higher GDPs tend to have lower diabetes rates. Indeed, we can observe that in the EDA.
It is important to note that, when we try to fit a linear model on these variables and observe correlations over all observations, we see that these relationships are not significant at all.
#> [1] -0.369
#> [1] -0.236
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | 1.89e+12 | 1.09e+12 | 1.74 | 0.093 | . |
| GDP_diabetes_cal$prop_men_diabetes | -2.00e+13 | 1.58e+13 | -1.26 | 0.217 |
However, once we exclude outliers, we see that the relationship is way more significant !
#> [1] -0.696
#> [1] -0.739
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | 7.42e+11 | 1.24e+11 | 5.99 | <0.001 | *** |
| GDP_diabetes_cal2$prop_women_diabetes | -1.03e+13 | 2.26e+12 | -4.55 | <0.001 | *** |
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | 1.15e+12 | 1.87e+11 | 6.12 | <0.001 | *** |
| GDP_diabetes_cal2$prop_men_diabetes | -1.40e+13 | 2.73e+12 | -5.15 | <0.001 | *** |
As mentioned in the first point, countries with a higher GDP have a lower diabetes rate which could potentially be explained by the consumption of fewer calories.
But is there a real correlation between these two variables ? Let’s check :
#> [1] 0.355
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | -3.02e+12 | 1.81e+12 | -1.67 | 0.106 | |
| GDP_diabetes_cal$total_consumption | 1.07e+09 | 5.46e+08 | 1.97 | 0.059 | . |
Neither the correlation between these two variables nor the linear regression is significant. However, it would be interesting to look further. To do this we can use the clustering method. To determine the number of clusters we use the elbow method. This method examines the percentage of variance explained as a function of the number of clusters. It is based on the idea that a number of clusters should be chosen such that the addition of another cluster does not allow for a better modeling of the data. The percentage of variance explained by the clusters is plotted against the number of clusters.
#> Warning: Setting row names on a tibble is deprecated.
We therefore see from the graph above that the optimal number of clusters is 2. The allocation of countries according to their cluster is therefore as follows:
#> Warning in lda.default(x, grouping, ...): variables are collinear
#> [1] 2 2 2 2 2 2 2 2 2 1 1 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 1
| Country | Cluster |
|---|---|
| AUT | 2 |
| BEL | 2 |
| BGR | 2 |
| HRV | 2 |
| CYP | 2 |
| CZE | 2 |
| DNK | 2 |
| EST | 2 |
| FIN | 2 |
| FRA | 1 |
| DEU | 1 |
| GRC | 2 |
| HUN | 2 |
| IRL | 2 |
| ITA | 1 |
| LVA | 2 |
| LTU | 2 |
| LUX | 2 |
| MLT | 2 |
| NLD | 2 |
| POL | 2 |
| PRT | 2 |
| ROU | 2 |
| SVK | 2 |
| SVN | 2 |
| ESP | 2 |
| SWE | 2 |
| GBR | 2 |
| CHE | 1 |
To get a better idea of the difference between the clusters, we will plot the means of each dimension in each group:
| Cluster | |
|---|---|
| AUT | 2 |
| BEL | 1 |
| BGR | 2 |
| HRV | 2 |
| CYP | 2 |
| CZE | 2 |
| DNK | 2 |
| EST | 2 |
| FIN | 2 |
| FRA | 3 |
| DEU | 3 |
| GRC | 2 |
| HUN | 2 |
| IRL | 2 |
| ITA | 3 |
| LVA | 2 |
| LTU | 2 |
| LUX | 2 |
| MLT | 2 |
| NLD | 1 |
| POL | 2 |
| PRT | 2 |
| ROU | 2 |
| SVK | 2 |
| SVN | 2 |
| ESP | 1 |
| SWE | 1 |
| GBR | 1 |
| CHE | 3 |
We then plot those 3 clusters to see the differences:
#> Warning in lda.default(x, grouping, ...): variables are collinear
#> [1] 3 1 3 3 3 3 3 3 3 2 2 3 3 3 2 3 3 3 3 1 3 3 3 3 3 1 1 1 2
| x |
|---|
| 3 |
| 1 |
| 3 |
| 3 |
| 3 |
| 3 |
| 3 |
| 3 |
| 3 |
| 2 |
| 2 |
| 3 |
| 3 |
| 3 |
| 2 |
| 3 |
| 3 |
| 3 |
| 3 |
| 1 |
| 3 |
| 3 |
| 3 |
| 3 |
| 3 |
| 1 |
| 1 |
| 1 |
| 2 |
The table shows that there is a better distribution of GDP, we will again make a graph to compare the differences between the clusters. Oddly enough, it seems that a higher consumption of animal protein could be related to the rate of diabetes, which would be counter-intuitive to M. Adeva-Andany’s (2019) article “Dietary habits contribute to define the risk of type 2 diabetes in humans”. We are going to investigate this very issue of calorie consumption patterns that could be related to low diabetes rates.
We observed during the EDA that richer countries seemed to consume more fat on average. Now we want to see if we can confirm this relationship, and observe if there isn’t a correlation between the average GDP of a country and the consumption of other macronutrients too.
#> [1] 0.272
#> [1] 0.0433
#> [1] -0.0828
#> [1] 0.503
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | -7.16e+11 | 8.61e+11 | -0.832 | 0.413 | |
| GDP_diabetes_cal$cal_prot_animal | 5.28e+09 | 3.59e+09 | 1.470 | 0.153 |
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | 2.76e+11 | 1.14e+12 | 0.243 | 0.809 | |
| GDP_diabetes_cal$cal_prot_plant | 1.52e+09 | 6.75e+09 | 0.225 | 0.823 |
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | 1.21e+12 | 1.59e+12 | 0.764 | 0.452 | |
| GDP_diabetes_cal$cal_carbs | -3.93e+08 | 9.11e+08 | -0.432 | 0.669 |
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | -1.94e+12 | 8.27e+11 | -2.35 | 0.027 |
|
| GDP_diabetes_cal$cal_fat | 2.13e+09 | 7.03e+08 | 3.03 | 0.005 | ** |
The relationship between the average GDP of a country and the calories consumed from fat per person is therefore not as significant as we may have thought. For other macronutriments, there seems to be no correlation at all.
Another way to answer this research question can be to see the relationship between the wealth of a country and the proportions of the total calories consumed dedicated to each macronutrient.
#> [1] 0.147
#> [1] -0.186
#> [1] -0.437
#> [1] 0.428
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | -2.73e+11 | 1.05e+12 | -0.260 | 0.797 | |
| GDP_diabetes_cal$proportion_animal_prot | 1.13e+13 | 1.46e+13 | 0.771 | 0.447 |
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | 1.74e+12 | 1.24e+12 | 1.403 | 0.172 | |
| GDP_diabetes_cal$proportion_plant_prot | -2.40e+13 | 2.44e+13 | -0.984 | 0.334 |
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | 4.81e+12 | 1.70e+12 | 2.83 | 0.009 | ** |
| GDP_diabetes_cal$proportion_carbs | -8.14e+12 | 3.22e+12 | -2.53 | 0.018 |
|
| Estimate | Std. Error | t value | Pr(>|t|) | ||
|---|---|---|---|---|---|
| (Intercept) | -2.27e+12 | 1.15e+12 | -1.98 | 0.058 | . |
| GDP_diabetes_cal$proportion_fat | 7.97e+12 | 3.24e+12 | 2.46 | 0.02 |
|
With proportions, correlation is higher than with calorie count and linear regression parameters a bit more significant but the relationships are still not strong enough.
-top 5, last 5 ?